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Abstract 

The popularization of cloud computing has brought a concern over the 
^ ■ energy consumption in data centers. In addition to the energy consumed 

0^ ' by servers, the energy consumed by the large number of network devices 

emerges as a significant problem. Existing work on energy-efficient data 
center networking primarily focuses on traffic engineering, which is usually 
adapted from traditional networks. We propose a new framework to em- 
^^ , brace the new opportunities brought by combining some special features 

^3 ' of data centers with traffic engineering. Based on this framework, we 

en I characterize the problem of achieving energy efficiency with a time-aware 

model, prove its NP-hardness and solve it in two steps. First, we solve 

the problem of assigning virtual machines (VM) to servers to reduce the 

, J. , amount of traffic and to generate favorable conditions for energy-efficient 

r> ■ routing. The solution proposed for this problem is based on three essential 

principles we propose. Second, we reduce the number of active switches 
and balance traffic flows, depending on the relation between power con- 
sumption and routing, to achieve energy conservation. Experimental re- 
sults show that, by using the whole framework, we can achieve up to 50% 
energy savings. Extensions to general cases prove that our method is scal- 
able and practical in use. 

Keywords: Data Center Networks, Energy Efficiency, Virtual Machine 
Assignment, Traffic Engineering 

1 Introduction 

Data centers are integrated facilities to house computer systems for cloud com- 
puting, and have been widely deployed in big companies, such as Google, Yahoo! 



and Amazon. The energy consumption of data centers has become an essential 
problem. It is shown in [?] that the electricity used in global data centers in 
2010 likely accounted for between 1.1% and 1.5% of total electricity use and 
it is still increasing. However, while energy saving techniques for servers have 
evolved, the energy consumption of the enormous amount of network devices 
used to interconnect servers has emerged as a substantial issue. Abts et al. [?] 
showed that in a typical data center from Google, the network power is around 
a fraction of 20% to the total power with 100% utilized servers, but it increases 
to 50% when the utilization of servers decreases to 15%, which is quite realistic 
in production data centers. Therefore, improving the energy efficiency of the 
network also becomes a primary concern. 

There is a large body of work in the field of energy efficiency in Data Center 
Networks (DCNs). While some energy-efiicient topologies have been proposed 
([?], [?]), most of the works are focused on traffic engineering, trying to consoli- 
date flows onto a subset of hnks and switch off unnecessary switches ([?], [?], [?], 
[?]). These solutions are generally based on characterizing the traffic pattern 
by prediction, which is usually not feasible or is not precise enough because the 
traffic patterns vary significantly depending on the different applications. 

We believe that in order to improve the energy efficiency in DCNs, the unique 
features of data centers should be explored and utilized. In particular, we think 
that the following features should be taken into account: 

a) Regularity of the topology: compared to traditional networks, DCNs using 
new topologies, such as fat-tree [?], BCube [?] and DCell [?], arc more regular 
and symmetric. As a result, it is possible for us to have better knowledge 
about the physical network. 

b) VM assignment: due to virtualization, we can determine the endpoints of 
traffic flows, which will have a remarkable influence on the traffic carried by 
the network and will, consequently, condition traffic engineering. 

c) Application characteristics: most applications in cloud data centers are run 
under the MapReduce paradigm [?] , bringing about regular and recognizable 
communication patterns. Making use of these characteristics can help us get 
rid of the traffic prediction and obtain better traffic engineering results. 

In order to fully seize these new opportunities, we propose a new general 
framework (as illustrated in Figure [T]) for achieving energy efficiency in DCNs, 
where the particular information of both the applications and the network will 
be deeply explored and coherently utilized. We will carefully design the VM 
assignment based on a comprehensive understanding of the applications char- 
acteristics and combining them with the aforementioned network features (e.g. 
topology, end-to-end connectivity and communication bandwidth). This pur- 
poseful VM assignment will provide us favorable traffic conditions on the DCN 
and thus gain us some energy savings in advance before carrying out the traffic 
engineering on the network. Then, we will explore particular traffic engineering 
solutions according to the specific traffic patterns and network features. 
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Figure 1: A new general framework for achieving energy efficiency in DCNs 



The main contributions of this paper arc highhghted as foUows. First, we 
provide a new general framework for the energy minimization in DCNs. We also 
carry out exhaustive analysis on how to proceed with this framework and enu- 
merate new issues and challenges. Second, we model the energy-saving problem 
in DCNs by using this new framework and prove its hardness. Third, we provide 
in-depth analysis on both VM assignment and network routing with respect to 
energy conservation, showing that there is a big room for improving the energy 
efficiency by making use of some unique features of data centers. Fourth, based 
on the analytical results, we provide efficient algorithms to solve the problem. 
We also conduct comprehensive experiments to evaluate the efficiency of our 
method. 

The rest of this paper is organized as follows. In Section [21 we describe the 
general framework and discuss how it can be erected. In addition, wc list some 
newly arising issues. In Section [31 we present a time-aware model to describe 
the energy-saving problem in DCNs based on the new framework and analyze its 
hardness. We explore VM assignment principles for energy saving and provide a 
traffic-aware energy-efficient VM assignment algorithm in Section [4] The rout- 
ing optimization is addressed in Section [5l where we present detailed theoretical 
analysis and provide a two-phase energy-efficient routing algorithm. Section [SI 
provides the experimental results and Section [7] presents some extended discus- 
sions on the practicality of our algorithms. In Section [8l we summarize related 
work and in Section [9l we draw a simple conclusion. 



2 The General Framework 



Although we are considering the problem of achieving energy efficiency in DCNs, 
this framework can also be generalized for most of the performance optimiza- 
tion problems in DCNs. In this section, we discuss in general how to conduct 
optimization work by using this framework and list some new challenges. The 
structure of this new framework has been illustrated in Figure [H 

Applications. As an important paradigm for large-scale data processing. 
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Figure 2; A fat-trcc topology with 16 servers connected 



MapReduce [?] has been widely applied in modern cloud data centers of many 
big companies. Most cloud applications have been ported to MapReduce. For 
this reason, we focus our attention on typical MapReduce jobs. A typical 
MapReduce job comprises three main phases: Map, Shuffle and Reduce. The 
network is only intensively used in the Shuffle phase in order to exchange inter- 
mediate results between processors. As a result, MapReduce-type applications 
usually own regular communication patterns. Xie et al. [?] profiled the network 
patterns of several typical MapReduce jobs, including Sort, Word Count, Hive 
Join, and Hive Aggregation, which represent an important class of applications 
residing in data centers. They observed that all these jobs generate substantial 
traffic during only 30%-60% of the entire execution, and the traffic patterns of 
these jobs can mainly be classified into three categories: single peak, repeated 
fixed- width peaks and varying height and width peaks. Having this in mind, 
the network traffic can be scheduled in advance, which will completely change 
the traffic engineering results. 

The characteristics of applications can be obtained by profiling runs of jobs. 
The detailed profiling method is out of the scope of this paper, but one possible 
realization can be found in [?]. The profiling process may bring ineluctable 
profiling overhead, but it can be drastically reduced if the same types of jobs with 
the same input size are run repeatedly. We observe that such scenario is quite 
common in cloud data centers for iterative data processing such as PageRank 
[?] where much of the data stays unchanged from iteration to iteration, and also 
in many production environments (e.g. [?]), the same job needs to be repeated 
many times with almost identical data. 

Data center networks. In order to provide reliability and sufficient bisec- 
tion bandwidth, many researchers have proposed alternatives to the traditional 
2N tree topology [?] . By providing richer connectivity, topologies such as fat-tree 
([?, ?]), BCube [?], DCell [?] and VL2 [?] can handle failures more gracefully. 
Among them, fat-tree was proposed to use commodity Ethernet switches to 
support the full aggregate bandwidth in large-scale data centers. As shown in 
Figure [2l a fat-tree is built from a large number of richly connected switches, 
and can support any communication pattern with full bisection bandwidth. 

Apart from that, the DCN can bring us another special benefit, the regularity 
of the topology. Most topologies being used in DCNs follow a multi-tier tree 



architecture. The scalabihty of such topologies is always achieved by scaling 
up each individual switch, i.e. by increasing the fan-out of single switches, 
rather than scaling out the topology itself. Since such topologies in different 
scales always possess almost the same properties, the optimization efforts we 
make for small-scale networks can be easily adapted to large-scale ones with 
very slight changes. This enables us to make use of the unique features of the 
well-structured topologies to improve network performance by gaining insights 
from small-scale networks. 

VM assignment. To improve flexibility and overall hardware- resource uti- 
lization, virtualization has become an indispensable technique in the design and 
operation of modern data centers. Acting as a bridge, VM assignment provides 
the possibility of combining the characteristics of applications and traffic en- 
gineering. With the goal of improving the performance of the network, VM 
assignment can be accomplished by integrating the characteristics of the run- 
ning applications and the special features of the network topology. For instance, 
knowing the traffic patterns of applications, we can schedule jobs in a way that 
their communication-intensive periods are staggered or jobs with similar com- 
munication patterns are separated into different areas of data centers. As a 
consequence, the load on the network will be more balanced and the network 
utilization will be accordingly improved. By assigning VMs in an appropriate 
way, we will be able to obtain better initial conditions for the following traffic 
engineering. 

Traffic engineering. As a conventional approach for the optimization of 
network performance, traffic engineering has also been extensively investigated 
in DCNs. Most of the traffic engineering solutions being used in current data 
centers are simply adapted from traditional networks. In a traditional opera- 
tional network, traffic engineering is usually carried out by traffic measurement, 
characterization, modeling and control. However, with the specific features 
that characterize DCNs, traffic engineering could be quite different from those 
conventional ones. Using the information of traffic patterns provided by VM as- 
signment, a better understanding on traffic can be achieved and, consequently, 
traffic measurement and characterization can be eliminated, leading to more 
precise traffic engineering results. At the same time, we can also take advan- 
tage of the unique features of the DCN topology and design elaborate traffic 
engineering solutions in particular. 

Under this new framework, there are some newly arising issues and challenges 
that may need future research efforts. 

a) The applications running in current data centers show regular communi- 
cation patterns and can be obtained by profiling. However, the profiling 
method will directly condition the accuracy of this information. As a result, 
effective and efficient profiling methods are eagerly desired. 

b) Different metrics for network performance may prefer different favorable traf- 
fic conditions, which are conditioned by VM assignment. Thus, understand- 
ing the favorable traffic conditions and designing efficient VM assignment 
algorithms to generate them are crucial in this framework. 



c) The universal traffic engineering solutions may not be efficient enough for 
current DCNs. In order to obtain better results, specific traffic engineering 
methods for each particular data center need to be explored by making use 
of both the topology features and the traffic patterns known in advance. 

3 Modeling the Energy-Saving Problem 

In this section, we focus on the energy saving problem. By using the proposed 
framework, we present a temporal model for it and prove its hardness. We start 
with some preliminary modeling. 

3.1 Data Center and Data Center Network 

We consider a data center as a centralized system where a set of servers are 
connected by a well-designed network. Suppose there are m servers represented 
by set M. To achieve better utilization of hardware resources, jobs are processed 
by VMs which are hosted by servers. All the servers are connected by a network 
G = {V,E), where V is the set of switches and E is the set of links. In this 
work, we focus on switch-centric physical network topologies and use the most 
representative one, fat-tree, to conduct our work. For each node v Cz V, the 
total traffic load it carries can be expressed by Xi, = i J2{e\v is incident to e} V^^ 
where ye is the total traffic carried by link e. The factor 1/2 avoids counting 
each flow twice. 

For a single network device, energy-saving strategies have been widely ex- 
plored. Among them, speed scaling ([?, ?, ?, ?]) and power down ([?, ?]) are 
two representative techniques. The former aims to adaptively reduce the trans- 
mission rate of a network device when the traffic going through it decreases, 
resulting in energy savings. This is based on the observation that during most 
of the time, the network carries little traffic compared to the peak. Although it 
might be more complex to manufacture network equipments having this func- 
tionality, there have already been some kinds of network devices designed with 
variable operating rates, such as InfiniBand [?] and Cray YARC [?]. The latter 
is to switch off a network device when it is idle, bringing substantial reduction 
on energy consumption. In this paper, we use both strategies in an integrated 
way. More precisely, we characterize the power consumption of a switch v G V 
by an energy curve fv{xv), which indicates how v consumes power as a function 
of its transmission speed x^. Usually function /i,(x„) can be formalized as 

Jo forx, =0 

I (Jv + /it,a;„ tor Xy > \) 

where ct„ represents a fixed amount of power needed to keep a device active, and 
fly and a are parameters associated with devices. This way, if a network device 
carries no load, it can be shut down and incurs no cost. Otherwise, an initial 
cost is paid at the beginning and then, the cost goes up as the assigned load 



increases. We assume that the power consumption of a network device grows 
superadditively with its load, being a usually larger than 1 (it has been shown 
in (1,3] [?]). Due to the homogeneity in DCNs, it is convenient to assume 
a uniform cost function /(•) for all switches. The total cost of a network is 
defined as the total power consumption of all of its network devices, given by 

3.2 Applications 

Now we characterize the communication patterns of the applications. As we 
discussed before, the applications can be classified into three main categories 
according to their communication patterns. For the simplicity of exposition, 
here we use the single peak pattern as an example. Later, we will discuss how 
to adapt our results to other patterns. 

Assume we arc given a set of k jobs, represented by J, that have to be 
processed simultaneously during our interested period of time [ti, tr]. The time 
unit is chosen according to the traffic patterns of jobs such that during each 
unit of time, the traffic is relatively stable. Each job j € J comprises rij tasks 
each of which has to be processed on a pre-specified VM. For each job j £ J, 
we assume a time period [tjs , tjt] C \ti , t^] during which the traffic of this job is 
substantial and, for each unit of time t £ [tjs, tjt], a traffic matrix Tj(i) of size 
rij X Uj is given to indicate the communication pattern of job j at that time. 
For t G [ti, tgj) U (ijj, tj], we assume there is only very small background traffic 
and has small infiucncc to the network. 

3.3 Problem Description 

We describe now the energy-saving problem in DCNs and provide a time-aware 
network energy optimization model to redefine this problem. We assume that 
the VMs of jobs will not be migrated once they have been assigned. Because 
in cloud data centers, jobs are usually very small and need to be repeated for 
many times [?]. For example, the average completion time of a MapReduce job 
at Google was 395 seconds during September 2007 [?]. Nevertheless, we leave 
the case where VM migration is involved as future work. With the modeling 
of DCNs and jobs, the total energy consumed by all the network elements for 
processing all the jobs then can be represented by EN — J2t=t (Su fi^v{t))) , 
where Xv{t) is the load of node v at time t. Our goal is to assign all the VMs 
to servers such that when we choose appropriate routing paths for the flows in 
between each pair of VMs in communication, the total cost EN is minimized. 
The optimization procedure can be divided into two closely related stages: 
VM assignment and traffic engineering. Given an assignment of VMs, the total 
cost can be minimized by applying traffic engineering on the network, which 
solves an energy-efficient routing problem. We first assume that an algorithm 
A has been given to solve this routing problem. Then, the VM assignment 



problem can be modeled by the following integer program. 

(Pi) min Ett.ADit)) 
subject to 

}^l<i<m ^ix = i- yX 

Zix e {0,1} \/i,x 

where Zix indicates whether the VM x is assigned to the i-th server (1 < i < m). 
Cx is the abstract resource required by VM x and Cr is the total amount of 
resource in one server. The second constraint means that each VM can and 
only can be assigned to one server. D{t) is a set of traffic demands to be routed 
at time t. Each demand in D{t) is indicated by a triple consisting of a source, 
a destination and a flow amount. Once an assignment is given, D{t) can be 
obtained by the traffic patterns of jobs at time t. 

Now we dispose of the energy-efficient routing problem that algorithm A 
aims to solve. After obtaining the traffic demands D{t), this problem can be 
represented as follows: given a network G ~ (V, E) with a node cost function 
/(■) and a set of traffic demands D{t), the goal is to unsplittablely route every 
demand in D{t), such that the total cost of the network J^v fi^v) is minimized, 
where Xy is the total load on node v. Formally, it can be formulated as the 
following integer program. 

(F2) mm E./(-^«) 
subject to 



X-j! 



kE.^eye<c yvev 



Ve = Z/ie[i,|_D(t)|] y»,e Ve 

Vi^e G {0,1} Vi,e 

yi,e '■ flow conservation 

where yi^e is an indicator variable to show whether demand i {1 < i < \D[t)\) 
goes through edge e. Flow conservation means only a source (sink) node can 
generate (absorb) flows, while for other nodes, the ingress traffic equals the 
egress trafflc. ye is the total load carried by link e, and Xv is the total traffic 
going through node v and will never exceed the capacity C . 

3.4 Hardness Analysis 

We now analyze the computational complexity of this problem. Actually, the 
NP hardness can be proved by a reduction from the general Quadratic Assign- 
ment Problem (QAP), which describes the following problem: there is a set of n 
facilities and a set of n locations. A distance and a weight are specifled for each 
pair of locations and facilities respectively. The problem is to assign all facilities 
to different locations with the goal of minimizing the sum of the distances mul- 
tiplied by the corresponding weights. QAP was flrst studied by Koopmans and 
Beckmann [?] and is well known to be strongly NP-hard. Moreover, achieving 
any constant approximation for the general QAP is also NP-hard. It is believed 



that even obtaining the optimal solution for a moderate scale QAP is impossible 
[?]. Formally we show 

Theorem 1. Finding the optimality of the energy- saving problem in DCNs is 
NP-hard. 

Proof. We prove it by showing that any polynomial-time deterministic algorithm 
that can obtain the optimal solution for our energy-saving problem can be used 
to solve QAP. Suppose we are given an instance of QAP with n locations and n 
facilities. For these locations and facilities, we are also given two matrices Md 
and Mc of size n x n to indicate the distance between each pair of locations 
and the cost between any two facilities respectively. The total cost of this QAP 
instance is 

^ Md(ii,i2)Mc(^(n),7r(i2)) (2) 

where tt is a permutation of [n]. The reduction from QAP to our problem is 
built as follows: 1) create n nodes for servers; 2) for each pair of servers, connect 
them by a single switch; 3) for a switch connecting two servers s^^ and Si^ , define 
its energy consumption function as giii2{x) = a + ]V[d(ii, 12)2;", where cr is a 
constant and Md(ii, 12) is the distance between the «i-th and the i2-th locations 
in the QAP instance. We treat the facilities as a set of VMs and the a root 
of the cost between any two facilities (Mc)"'^/" as the traffic flow between the 
corresponding VMs. This way, the corresponding energy-saving problem is to 
allocate each VM into one server such that the total energy consumption is 
minimized. Given an assignment of VMs tt (a permutation of [71]), the total 
energy consumption can be expressed by 

Y, (<T + Md(il,i2)(Me(7r(ii),7r(z2))l/")"). (3) 

It can be found that the only difference between the total cost of the QAP 
instance (as in ([5])) and the one in our problem (as in ([3])) is a constant value 
n^a. Therefore, when we obtain the optimal solution for our problem, the 
corresponding assignment is also optimal for the corresponding QAP. As a result, 
any polynomial time deterministic algorithm optimally solving the energy-saving 
problem in DCNs can be borrowed to solve QAP. That ends the proof. D 

4 Exploring Energy-Efficient VM Assignment 

In this section, we seek energy-efficient VM assignment strategies by exploiting 
some unique features of the usually well-structured topologies of DCNs. Com- 
bining this with the analysis of the characteristics of the applications, we provide 
three main principles to guide the VM assignment. Based on these principles, 
we propose a traffic-aware energy-efficient VM assignment algorithm. First of 
all, we provide the following prerequisites. 



Table 1: Power Rating Profiles of Some Typical Commodity Switches (Unit: 
Watts) 



Product 


Idle or Nominal 


Max 


Cisco Nexus 3548 


152 


265 


Cisco Nexus 5548P 


390 


600 


HP 5900AF-48XG 


200 


260 


HP 5920AF-24XG 


343 


366 


Juniper QFX 3600 


255 


345 



Definition 2. The power rate of a network device is defined as the power 
consumed by every unit of load it carries, i.e., f{x)/x {x > 0). 



Proposition 3. Ideally, the total power consumption of a network is minimized 

when all the network devices carry the load R* = ( ^j^(^_i) ) , given a certain 
amount of traffic. 



M{a-1) 



Proof. Recall that the cost function of each network device is defined in a con- 
stant and a load dependent part (indicated in ([T])) and, the optimal solution 
aims to balance them because of the convexity of the load dependent cost. An- 
other observation is that given a certain amount of traffic, the sum of the total 
amount of traffic carried by all the switches in the network is invariable, no mat- 
ter how the traffic is routed, because the paths connecting each pair of servers 
have equal lengths. Based on these, it is natural to see that the total power 
consumption of a network is minimized when the power rate of every network 
device is minimized. That is, we minimize (-^ + ^x^~^), for each v €Va where 
Va CV is the set of active switches. This can be achieved by choosing all x^ to 

be ( ,^_i-, 1 , denoted a.s R* . D 

However, this proposition may not be able to directly apply in reality. Ac- 
cording to the statistics in [?], the idle power consumption of a 48-port edge 
LAN switch ranges from 76 Watts to 150 Watts, while working at full speed, 
about 40 Watts or more can be added. Also in [?], the authors measured the 
power consumption on a production PRONTO 3240 OpenFlow-enabled switch 
and obtained a very similar result. In order to confirm this finding, we collected 
the power rating profiles of some commodity switches from the vendors' web- 
sites. The detailed information can be found in Table [T] This proves that the 
idle power usually takes a big portion of the total power consumption. This 
basically means that the startup cost a in our model is quite high. As a con- 
sequence, it is more likely to have R* > C. However, as the load in a switch 
cannot be larger than C, Proposition|3]might not be applied. In order to respect 
this finding, we will assume in the rest of this work that R* > C which in turn 
is a >fx{a-l)C°'. 
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Figure 3: Two ToR switches connected by a network with a general topology 

4.1 VM Assignment Principles 

We now propose the principles for VM assignment that need to be followed 
for achieving better energy efficiency in DCNs. We use a bottom-up analyzing 
approach, i.e., in a fat-tree, we focus on racks, pods and then the whole data 
center at last. 

4.1.1 Arranging VMs into Racks 

We first concentrate on Top-of-Rack (ToR) switches, as ToR switches are dif- 
ferent from other ones in the network. Once there is at least one server active 
in that rack, the corresponding ToR switch cannot be shut down because there 
might be some inter-rack traffic. Also ToR switches have to carry the intra-rack 
traffic which cannot be derived to other switches. As a result, the power con- 
sumption of ToR switches will be largely conditioned by VM assignment. The 
following result can help us determine the right number of ToR switches. 

Theorem 4. (Principle 1) The optimal VM assignment consists in compact- 
ing VMs into racks as tightly as possible in order to minimize the power con- 
sumption of ToR switches. 

Proof. Wc focus on two arbitrary ToR switches in a fat-tree like the one in 
Figure [3l Let A and B represent the set of VMs assigned to the servers con- 
nected with the two switches respectively. In order to conduct our compar- 
ison, we assume, without loss of generality, that all the VMs in set B can 
be accommodated into the left-side servers without violating the resource ca- 
pacity. Assume the traffic between each pair of VMs in A and B is char- 
acterized by a matrix Q, where Q{x,y) indicates the traffic flow sent from 
VM X to VM y. Denote X = E^eAEyeA^i^^y)^ Y = E.eA E,eB Q(^.2/). 
Z = J2x€B EyeA Q(^' y)' and W = J2xeB Eyes Q(2^' 2^)- 

For the setting we have in Figure [S] apart from the energy consumed by the 
switches lying in the intermediate network, the total network power consumption 
incurred by the traffic generated by VMs in A and B is represented by Pi = 
2(7 + fi(X + Y + Z)" + /i(y -\- Z + VK)". Then, wc consider an alternative 
assignment where wc move all the VMs in B to the left-side servers. On one 
hand, the right-side ToR switch can then be shut down in order to save energy, 
as there will be no VM in the right side. Moreover, the power consumption on 
the intermediate network will be reduced because of traffic decrease. On the 
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other hand, the traffic load carried by the left-side ToR switch will increase, 
resulting in more power consumption. The total power consumed by the two 
ToR switches after this VM consolidation becomes P2 = a + ij{X + Y + Z + W)°' . 
Wc now compare the power consumption in the above two cases. We denote 
AP as the difference between them. Then, we have 

AP >a + ^{{X + Y + Z)°' + {Y + Z + W)"" - (X + Y + Z + W)"). 

Now we consider the following two cases. 

Case 1: a > 2. As a > ^i{a - 1)C", we have ct > /i(a - 1)C" > nC > 

H{X + Y + Z + VK)". The third inequality follows from X + Y + Z + W <C. 

Then, we have A > 0. 

Case 2: 1 < a < 2. Wc define function 

,f(Y, Z) = {X + Y + Z)"" + (Y + Z + W)" -(X + Y + Z + W)". 

It is easy to check that the partial derivatives of f{Y, Z) where both Y and Z 
are non- negative, i.e., gy ' > and q^ ' > 0. This means that function 
f{Y, Z) is monotone increasing with both Y and Z. By setting F = Z = we 
have 

AP > CT + ^(X" + W°'-{X + WD > fiCia - 1) + Ai(2(^)" - C") 



2 



^C^a + -^-2)>0 



The second inequality comes from the fact that X" + W"' — {X + W)" is 
minimized when X ^ W ^ C/2 with X + W < C. The last inequality can be 
easily verified. 

In one word, AP > holds for any a > 1, so consolidating VMs will provide 
us a better energy efficiency for the two ToR switches. As a result, reducing 
the number of ToR switches as possible will always improve the network energy 
efficiency. As we have stated, this supposes compacting VMs into racks as 
tightly as we can, which completes the proof. D 

4.1.2 Arranging VMs in one Pod 

We now discuss how to allocate VMs in one single pod. It is obvious that we 
will have no chance to switch off any ToR switches if the above principle has 
been applied. Based on this, we assume a scenario where there are a few jobs 
whose VMs are assigned into one pod but no job except one has dramatic traffic 
among its VMs at a certain time. Now we discuss how to arrange the VMs of 
that job into racks in this pod in order to reduce the power consumption of the 
switches. Basically, the following result can be obtained. 

Theorem 5. (Principle 2) Suppose there are K racks in one pod, where K > 
4°-i . Then, distributing the VMs into k racks will incur less power consumption 
compared to compacting the VMs into one single rack, where 4°-i < fc < K . 
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Proof. Suppose we are given a set of jobs in which there is only one job that 
has significant networking requirements. We focus on the assignment of the 
VMs for this job. First, we consider an even distribution of these VMs in 
k different racks. We denote the intra-rack traffic on each ToR switch as Ui 
(1 < i < fc) and the inter-rack traffic between the ii-th and the i2-th rack as 
ifjiia (1 ^ *i: *2 < k). Assume we only use half of the aggregation switches to 
carry the load evenly, which is quite conservative with respect to the energy- 
saving strategies for routing (because we will need to shut down some switches 
later). The total power consumption of the switches incurred by traffic loads in 
this pod is 



^^ = E 



k \ 7 / V^ ^ V^ ^ 



while assigning all the VMs into one single rack supposes a total power con- 
sumption of 

For the sake of tractability, but without loss of generality, we consider the case 
where all the Ui are roughly equal, denoted as u, and all the Wi-^i^ are also 
roughly identical, denoted as w. Define AP = P2 — Pi Then, we have 

AP = (ku + M^^w] -k{u+{k- l)wf --{{k- l)w)" 

>fc" (u + i^zJ>^ ^k{{u+{k~ l)w)" + {{k - l)u.)") 

a (4) 

>k'- ( u + ^-^^^) -fc(u + 2(fc-lH" 



>(fc"-M")(u+^^-^') >0, 



where the second inequality is due to the convexity of the power consumption 
incurred by traffic loads, and the last inequality comes from our assumption 
that k > 4°^^. Thus, as long a.s K > 4°^^, it is possible to distribute all the 
VMs to all servers in one pod in order to reduce the power consumption. D 

The intuition behind this principle is that distributing VMs among multiple 
racks will move some traffic from the ToR switches to the upper-layer network. 
With the rich connectivity provisioned in the upper-layer network and the con- 
vexity property of energy consumption, this will bring a potential benefit in 
energy saving. For instance, if a = 2, as long as we evenly distribute the VMs 
for one job to k = 16 or more racks, the energy consumption will be reduced 
compared to compacting them into one rack. In production data centers, having 
more than 16 racks in one pod is quite realistic. As we are considering big data 
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centers, we can claim that K is not smaller than 4°^^ . Note that if all Wi-^i^ are 
small, the energy saving will be even more significant. 

Actually, this result could be extended to jobs with VMs assigned in more 
than one rack. However, we assume that this is sufficient because in [?] the 
authors pointed out that most jobs can be fitted into a single rack in a large- 
scale data center and in general few jobs will share a link at the same time. 
Moreover, the most important reason why we use this principle in our model is 
that it is time-varying. In our model, we will assign jobs with complementary 
traffic patterns into the same pod. This way, the interference between different 
jobs can be highly reduced, making our assumption for this principle reasonable. 

4.1.3 Arranging VMs among Pods 

The last principle we propose is how to assign VMs among pods. Basically, we 
want to answer the following question: is it better to assign all the VMs from 
the same job to different pods or to put them together in one single pod? 

Theorem 6. (Principle 3) As long as there are enough resources in one pod, 
the optimal assignment will never split VMs from the same job to different pods. 

Proof. Suppose we have one job with all its VMs assigned to a single pod A. 
Now we consider moving some VMs from ^ to a new pod B. Consider the 
simple case where we move the VMs assigned from one whole rack in A to 
an empty rack in B. Due to the connectivity property of fat-tree that for each 
aggregation switch, the outer fan-out (to other pods) is not larger than the inner 
fan-out (to ToR switches within the pod), the number of node-disjoint paths 
between any two ToR switches in A will not be smaller than the ones between 
two ToR switches in A and B respectively. As a result, moving the VMs from 
a whole rack to a different pod will never reduce the traffic on any ToR or 
aggregation switch, it will bring extra traffic in core switches, not reducing the 
power consumption. Then, it can be also inferred that moving some VMs to 
another pod is not beneficial due to the same reason. D 

4.2 Traffic-Aware Energy-Efficient VM Assignment 

Wc devise our VM assignment algorithm using the three proposed principles. 
This algorithm will assign VMs with favorable traffic patterns for saving energy 
on the network by perfectly observing these principles. The algorithm takes a 
set of jobs (sets of VMs) and its traffic communication patterns and a set of 
servers as input, and returns a job (VMs) assignment after processing the three 
steps listed in Algorithm [T] 

Firstly, we carry out a transformation to the VMs. As in our model each 
server can host multiple VMs, this will bring high complexity to the subsequent 
steps. Before the VM assignment, we transform the VMs into super- VMs which 
will be assigned alone to single servers based on the following proposition. 

Proposition 7. Compacting the VMs with high communication traffic will re- 
duce the network power consumption. 
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Algorithm 1 Traffic- Aware Energy-Efficient VM Assignment 

Input: data center topoiogy G ~ {V^E), set of servers S and set of jobs J 

Output: assignment for aii tlie VMs for ali jobs 

1: for i e J do 

2: Transform VMs to super- VMs 

3: end for 

4: Cluster jobs in J into groups Si for i G [1, Np] and Sn^+i 

5: for I < i < Np do 

6: Partition tlie super- VMs for eacfi job j e Sj into K parts using tire min- 
fc-cut algorithm 

7: Assign the super- VMs of each job into servers according to the partition 

8: end for 

9: Assign the VMs of jobs in Sjq +i into vacancy servers in the first Np pods 
flexibly. 

Proof. It can be easily observed from the fact that the traffic between any VMs 
assigned to the same server does not go to the physical NICs on the host server, 
as well as the network. Then, it is quite natural to assign VMs for the same 
job to the same servers in order to reduce the network traffic. In this sense, 
compacting VMs with high communication traffic will reduce the traffic on the 
network, resulting in more energy savings. D 

To complete this transformation, we define a referential traffic matrix Tj"**^ 
for each job j G J, where 

Tf f(x,y) = Y, Tj(i)(x,y) ^x,y G [l,n,]. 

We shrink VMs to super- VMs as follows. For each job j G J, we run the 
following process iteratively: in each iteration, we choose the biggest value in 
matrix TV^^ , Suppose this value exists in the xi-th row and yi-th column. Then, 
we combine the xi-th VM with the yi-th one by removing the traffic between 
them and adding up their traffic with other VMs. After that, we choose the 
biggest value in the xi-th row and yi-th row, and combine the corresponding 
VMs. We denote the VM after this shrink as a super- VM. Repeat this procedure 
until the resources of one server are maximumly utilized if we assign this super- 
VM to a server. Then, we remove all the VMs that have been chosen and shrunk 
from the matrix, and find the next biggest value to start a new iteration. With 
this transformation, all the jobs will be represented by super- VMs each one 
assigned to one single server. 

Secondly, we partition the jobs into different pods. Before that, we assume 
that every job can be accommodated in a single pod Nevertheless, if there were 
huge jobs requiring more than one pod, they can be assigned in a greedy way 
and then consider assigning the residual normal jobs. From Principles 1 and 
3 we know that the number of pods used for accommodating all the jobs has 
to be minimized. In other words, it is not wise to separate the super- VMs for 
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the same job into different pods if this job can be assigned into a single pod. 
Based on this, we estimate the number of pods to be used by summing up the 
resource requested by all the jobs. We denote the estimated number of pods as 
Np. Then, we partition the set of jobs into those Np pods by using a revised 
fc-means clustering algorithm which takes the traffic patterns of the jobs into 
account. With the intuition that it is better to consolidate jobs with strongly 
different traffic patterns into the same pod to improve the utilization of network 
equipments, the algorithm will compare the traffic patterns of jobs and cluster 
them into different groups, where in each group, the jobs in it will enjoy the 
maximized difference of communication patterns. 

To this end, we first calculate a traffic pattern vector Vj with size r for each 
job j' G J. Each dimension of Vj indicates the average traffic between any two 
VMs of job j in each unit of time and is calculated as 

if t 6 [tjs,tjt]; otherwise we set T"^^ to e where e is infinitesimal. The traffic 
pattern vector now can be expressed as 

We then give the following definition. 

Definition 8. Given two jobs j'l, j2 G J with traffic pattern vectors vj^ and Vj^ 
respectively, the distance between the two jobs is defined as 

dis{ji,J2) = dis{vj^,VjJ = -—^ :^---. 

This definition of distance supposes that any two jobs with similar traffic 
patterns will receive a big distance between them. Having these distance vectors, 
the clustering algorithm works as follows: 1) Choose Np jobs and put them into 
sets Si for i € [i,Np] with one job per set. Use the traffic pattern vectors of 
these jobs as center vectors cl of these sets. We adopt this initializing step from 
the refined fc-means++ algorithm [?]. In the traditional fc- means algorithm, 
the initial cluster centers are chosen randomly, which will lead to arbitrarily 
bad results. Compared to the traditional one, the fc-meansH — h algorithm can 
guarantee an approximation ratio 0(log A^p) in expectation. 2) For each of the 
residual jobs j, find the nearest cluster i with respect to the distance dis{vj, Ci). 
If this job can be accommodated into this cluster without any resource violation, 
put this job into set Si. Otherwise choose the next one with a larger distance 
and repeat until there is one cluster found to accommodate it. 3) Update the 
center vector of cluster i by averaging all the vectors of jobs in set Si . 






Vi£ [l,Np] 
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Repeat 2) and 3) until all the jobs have been assigned. If there are some jobs 
that cannot find any cluster to accommodate, put them into an extra set Sn +i- 

Thirdly, we choose Np free pods and assign the jobs in each cluster to one 
pod. Inspired by Principle 2, wc distribute the super- VMs of each job into 
multiple-racks. The simplest way is to randomly partition these super- VMs 
into K racks, where K is the total number of racks in one pod. However, as 
we have stated before, it is better to allocate the VMs with the highest traffic 
flows into the same rack. Then, the problem becomes to partition the set of 
super- VMs into K parts such that the traffic between each part of the partition 
is minimized. This is equivalent to the well-known minimum fc-cut problem 
that requires finding a set of edges whose removal would partition a graph to k 
connected components. The partition algorithm used here is adopted from the 
minimum fc-cut algorithm in [?]. For each job j, we build a graph Gj = (Vj, Ej), 
where Vj represents the set of super- VMs and Ej represents the traffic between 
each pair of super- VMs. Then, we compute the Gomory-Hu tree for G" and 
obtain rij — 1 cuts {gi} which contains the minimum weight cuts for all super- 
VM pairs. We remove the smallest K — \ cuts from {gi] and get K connected 
components of G' . For the super- VMs in the same components, we regard them 
as a super- VM cluster and will assign them into the same rack. 

After obtaining all the partitions of the jobs in every pod, we now try to 
assign these partitions into racks. For each job, we sort the super- VM clusters in 
decreasing order according to the cluster size. After that, we assign each cluster 
of the job to racks in a greedy manner. When the assignment of a job has been 
done, we sort all the racks in increasing order according to the number of used 
servers and assign the next job by repeating the above process, until all the jobs 
have been assigned. At last, we assign the jobs in set Snj,+i to the Np pods 
flexibly. Note that this can be accomplished because Np is computed by the total 
resources required, and with Np pods, all the jobs should be accommodated. 

5 Energy-Efficient Routing 

In this section, we focus on traffic engineering in DCNs to achieve energy conser- 
vation. We first explore the relation between energy consumption and routing 
and then, based on this relation, we design a two-phase energy-efficient routing 
algorithm. 

5.1 Exploring Energy Saving Properties 

As we have discussed in the previous section, in reality we have R* > C. In 
order to achieve energy saving, we need to answer the following questions: how 
many switches will be sufficient and how to distribute the traffic flows? In this 
section, we will explore the relation between energy saving and routing, and 
answer these questions. 

The second question is easy to answer once we have solved the first one. 
With the optimal number of switches determined, the best way to achieve energy 
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saving is to balance the traffic among all the used switches, due to the convex 
fashion power being consumed. In DCNs, this can be done by many multi- 
path routing protocols such as Equal Cost Multi-Path (ECMP) and Valiant 
Load Balancing (VLB). This is because data centers usually have networks with 
reacher connectivities, and these multi-path routing protocols use hash-based or 
randomized techniques to spread traffic across multiple equal cost paths. Some 
more sophisticated techniques such as Hedcra [?] and MPTCP ([?, ?]) can also 
be applied to ensure uniform traffic spread despite flow length variations. 

To answer the first question, we first concentrate on the aggregation switches 
(we have shown that nothing can be done with ToR switches once we have the 
VMs assigned). In general, we show the following lemma. 

Lemma 9. The optimal energy- efficient routing algorithm will use as few ag- 
gregation switches as possible. 

Proof. We focus our attention on the aggregation switches in one pod. Recall in 
fat-tree topology shown in Figure [U the connectivity between the ToR switches 
and the aggregation switches is supported by all-to-all mapping links. Thus, we 
can choose any aggregation switch to carry any flow coming out or going into a 
ToR switch. Denote the minimum number of aggregation switches to be used as 
Na- We will show that for any n > Na, the minimum total power consumption 
of the aggregation switches obtained using n aggregation switches will be always 
smaller than the one obtained using n -\- 1 aggregation switches. 

Assume in the optimal solution with n aggregation switches, the total load 
going through the i-th aggregation switch is pi G (0, C] (1 < z < n), while using 
n-\- 1 aggregation switches, this value is gi G (0, C] (1 < z < n -I- 1). Since aU 
the switches are identical, without loss of generality, we assume that Qn+i is the 
most loaded one among all qi , and pi and the residual qt are sorted in descending 
order. Denote Si = pi — qi for I < i < n then, we have qn+i = X]"=i ^i- Since 
both solutions are optimal, it can be observed that (5i > for 1 < i < n. Using 
n switches, the total power consumption is presented as 



P(n) = 7ia + M^P?, 
while using n + 1 switches it is 

P{n + l) = {n + l)a + fiY.{p,^S,r+lY.^^] ' 

To complete the proof, it is sufficient to show that for any n > N, we have 
P{n -I- 1) > Pin). Denote the difference between the two optimal solutions as 
AP. Then, 

n n 

AP = P{n + 1) - P{n) = b + M^((p. - Sr ~ Pt) + /^(E '5«)"- 

i=l 4=1 

Note that AP is a function of variables p and <5, where p = {pi,p2, ...pn) and 
d = ((5i, ($2, ..., (5„). Since p and 5 are independent and 5i > 0, AP is minimized 
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when we set x = (C, C, ..., C). That is, 




E^" 



i—l / \i— 1 ■'^ i— 1 



>f,C"+^,[{n + l)[-^j -nC'^j^^C-Ua^l) + nn-^] I 



!l±l^l\\=,,C'^^i^^ 



>fiC" {a-l + n{ — ■ 1 = i-iC"-^ '- > 0, 

\ \n + a J J n + a 

when a > 1. The second incquahty comes from the fact that AP is minimized 

when we set d = ( -r-r, -tt, ■•■, -tt 1 ■ The third inequahty is due to the restric- 

tion that a > fj.C°'{a — 1). The fourth inequahty is obtained by applying the 
necessary condition on n that the first derivative equals zero. Having AP > 
means that using fewer aggregate switches results in better energy efficiency. D 

The same technique can also be applied to the core switches if we ensure each 
flow can be routed by the candidate core switches when we choose aggregation 
switches in each pod. This is easy to achieve if we choose aggregation switches 
from the same positions in different pods and for sure there will be core switches 
connecting each pair of them. Taking all together, we have 

Theorem 10. In the optimal energy-saving solution, the number of active 
switches is minimized. 

5.2 Two-Phase Energy-Efficient Routing 

Based on the answers to the two questions we asked at the beginning of this 
section, we devise an energy-efficient routing algorithm, as listed in Algorithm^ 
For each unit of time, we repeat the following two phases. In the first phase, 
the algorithm devotes to find a subset of switches in a bottom up manner. The 
estimation of active switches is accomplished by a simple calculation where we 
divide the total traffic by the capacity of the switch. However, as it can happen 
that the multipath routing algorithm may not evenly distribute the traffic flows 
perfectly, we use the first fit decreasing algorithm, a good approximation for 
bin-packing problem where we treat the fiows as objects and the maximum 
transmission rate of the switch as the bin size, to ensure that all the traffic flows 
can be routed using the selected switches. 

In the second phase, we borrow the most recently proposed multipath routing 
protocol, MPTCP, to route all the flows. Compared to the single path routing 
for each flow in randomized load balancing techniques, MPTCP can establish 
multiple subflows across different paths between the same pair of endpoints for a 
single TCP connection. It can be observed that randomized load balancing may 
not achieve the evenly distribution of traffic because a random selection causes 
hot-spots to develop, where an unlucky combination of random path selection 
causes a few links to be underloaded and links elsewhere to have litter or no 
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Algorithm 2 Two-Phase Encrgy-EfBcient Routing 



Input: data center topology G = {y^E), set of servers S and VM assignment 
Output: routes of flows 

1: for t G [ti,ir] do 

2: Obtain the traffic flows on the network at time t according to the VM 

assignment 
3: for i € [l,Np] do 
4: Estimate the number Nai of aggregation switches that will be used in 

the z-th pod 
5: Choose the first Nai aggregation switches in this pod 

6: end for 
7: Estimate the number Nc of core switches that will be used and choose 

them 
8: Use multipath routing to distribute all the flows evenly on the network 

formed by the selected switches 
9: Turn the unused switches into sleep mode 
10: end for 

load. By linking the congestion control dynamics on multiple subflows, MPTCP 
can explicitly move traffic away from the more congested paths and place it on 
the less congested paths. A sophisticated implementation of MPTCP in data 
centers can be found in [?]. The unused switches will be turned into sleep or 
some other power-saving mode where few power is needed to maintain the state. 
Due to the way we assign VMs, the network state will maintain the same most 
of the time. Very few state changes will be needed on only a small amount of 
switches. According to the routes of flows, the routing tables are generated and 
sent to corresponding switches in runtime by a centralized controller and the 
OpenFlow realization in switches. 

6 Experimental Results 

In this section, we provide the detailed experimental flndings. We associate a 
cost functions to the switches in real data centers, implement our VM assign- 
ment and network routing algorithms presented in the previous sections, and 
compare the energy consumption against the solutions obtained by commonly 
used greedy VM assignment and multi-path routing. 

6.1 Experimental Settings 

We conduct a simulator with our algorithms implemented in Python and run 
all the simulations on a desktop with an Intel Core 2 T8700 CPU and 4 GB 
memory. We choose the following parameters for our model in the experiments. 
We use fat-trees as the topologies of data centers. We assume that the VMs 
used in all the jobs are identical and each server can handle two VMs. For all 
the switches in a data center, we use a uniform power function f{x) = a + fJ.x°'^ 
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Figure 4: Energy consumption using single path routing and ECMP routing 
algorithms in a data center network with 16 servers connected by a fat-tree 

and we set a = 200 Watts, /i = 1 x 10^^ and a = 2 (a; will use Gbps as the unit). 
Assuming the processing speed of a switch is limited by 1 Tbps, the maximum 
power consumption of one single switch will be 300 Watts. These parameter are 
chosen based on the real statistics of commodity switches as we have discussed 
at the beginning of Section |4] and also our assumption _R* > C is maintained. 
We select a time period with length 100 minutes (with the unit minute) as our 
interest and assume during the 100 minutes, we have a set of jobs needing to be 
processed in a data center. The jobs are generated randomly where the number 
of VMs each job requests follows a normal distribution M{K, G.SiiT) {K is the 
size of a rack). We associate each job with a communication-intensive time 
interval which is uniformly distributed in the 100 minutes. In each minute of 
this communication-intensive interval, a traffic matrix is provided to indicate 
the traffic pattern between each pair of the VMs for that job. The traffic loads 
in this matrix is generated following by a normal distribution A/'(50, 1) Mbps. 

Wc select benchmarks to show the efficiency of our algorithms. To evaluate 
the efficiency of our VM assignment algorithm, wc compare our results to the 
greedy assignment which assigns VMs to servers one by one such that the re- 
quested resource by VMs can be satisfied and is commonly used in production 
data centers. To evaluate the efficiency of our routing optimization, the normal 
ECMP is a perfect evaluating ruler for us because our routing algorithm is also 
established on multi-path routing. However, in order to get the exact energy 
consumption results, the normal ECMP algorithm needs to perform the routing 
procedure entirely and thus runs quite slow with large-scale topologies. In order 
to obtain the results in reasonable time, we choose single-path (SP) routing to 
take over from the normal ECMP. Being a very sophisticated routing proto- 
col, single-path routing has been implemented in most traditional networks and 
some production data center networks. We implement single-path routing with 
the Dijkstra shortest path algorithm. In order to make our comparison fair, we 
first study the relation between the energy consumption of ECMP and SP in 
small-scale data center networks with a 4-ary fat-tree topology. We tested with 
different amounts of load of the data center and recorded the energy consump- 
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Figure 5: Energy saving ratios using different VM assignment methods and 
routing algorithms in two data center networks in different scales with (a) 1024 
and (b) 3456 servers connected respectively. The ratios are obtained as the 
energy consumption normalized by the ones consumed using Greedy-SP 

tion under each case with both ECMP and SP. The experimental results have 
been shown in Figure H) It can be observed that ECMP consumes more energy 
than SP all the time no matter how much load the data center carries. This is 
mainly because with the same amount of traffic, ECMP uses more switches to 
route them, and according to Theorem 1101 this will in turn bring more energy 
consumption. As a result, using SP to evaluate our algorithm is appropriate 
instead of ECMP. 

6.2 Efficiency of Energy Savings 

We test our VM assignment and energy-efficient routing (EER) algorithm using 
two fat-tree topologies in different scales, with 1024 and 3456 servers connected 
respectively. We range the loads of data centers from 5% to 95% and com- 
pared the energy consumptions under different data center utilizations. For 
each scheduling instance, we compare 5 values of interest, the greedy assign- 
ment and SP routing solution, the optimized greedy (OptGreedy) assignment 
and SP routing solution, the greedy assignment and EER solution, the energy- 
efficient assignment (EEA) and EER solution and the optimized EEA (OptEEA) 
and EER solution. The optimized greedy assignment applies the VM to super- 
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VM transformation in our algorithm before the normal greedy assignment and 
OptEEA is our whole algorithm for VM assignment. The five curves in Figure [5] 
(a) and (b) correspond to the ratio of these 5 values all normalized by the greedy 
assignment and SP routing solution. We observe the following. 

a) The energy-efficient routing algorithm can provide substantial energy savings 
even the greedy assignment algorithm is being used. Up to 30% reduction 
on network energy consumption can be achieved by applying this routing 
optimization in data center networks. 

b) The well-designed VM to super- VM transformation is very helpful in reduc- 
ing the network energy consumption. As we have discussed, the traffic on 
the network will be reduced by applying this transformation. 

c) The distributed manner of VM assignment in each pod can bring us at least 
5% energy-savings. However, while combined with the VM to super- VM 
transformation and energy-efficient routing, the energy saving ratio will be 
as large as 50%. Compared to the energy saving brought by the energy- 
efficient routing, the whole VM assignment optimization wc proposed can 
bring us about 20% more savings. 

This convinces us that the room for saving energy in current data center 
networks remains quite big, and a huge amount of energy can be saved by not 
only traffic engineering, but also an integrated optimization of applications, VM 
assignment and routing, confirming the advantage of our framework. 

6.3 Running Time 

To ensure that our VM assignment algorithm can be carried out in big data 
centers, we have recorded also the running times of the used algorithms in our 
experiments. The numerical results are presented in FigurelHl We find that with 
the smaller-scale topology, our algorithm can be finished in one second, while 
with the lager-scale topology, the running time is within 10 seconds. Compared 
to the greedy algorithm, the running time of our algorithm is only 50 percent 
more than the greedy one. We have also tested our algorithm with large-scale 
topologies (with tens thousands of servers), most of the time, the running time 
is bounded by 2 minutes which is quite acceptable in production data centers. 



7 Discussion 

The proposed algorithms have been shown to have good quality on improving 
the energy efficiency of data center networks. Now wc devote to discuss how to 
make them practical with respect to those assumptions we have made before. 

As to simplify the modeling and analyzing, we assumed to use the single 
peak model to represent the communication patterns of jobs. However, the 
authors in [?] have pointed out that the traffic patterns of these jobs can mainly 
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Figure 6: Running times used by our energy-efficient VM assignment algorithm 
and the greedy algorithm in two data center networks in different scales with 
(a) 1024 and (b) 3456 servers connected respectively 

be classified into three categories: single peak, repeated fixed-width peaks and 
varying height and width peaks. Recall that in our model, the only place we refer 
to this traffic pattern information is the generation of traffic pattern vectors. 
It can be noticed that even with the most complicated pattern, this generation 
process can be simply adapted with only a slight change on the expressions. 

The model we propose in this paper is a suitable adoption for offline cases. 
However, in production data centers, we probably have cases with dynamic job 
arriving or leaving. We argue that the proposed algorithms can also be applied 
to online cases because the information interaction of jobs in our proposed al- 
gorithms is very few. One possible adaption can be that for each job arrived, 
we first apply the VM to super- VM transformation, and then compute the dis- 
tances between it and the other jobs running in the data center. According to 
the distances, we assign this job into a pod and then the rest of our energy- 
efficient VM assignment algorithm, as well as the energy-efficient routing, can 
be directly applied. We leave a deliberated adaption to online cases as future 
work. 



8 Related Work 

We summarize some related work on network-related optimization in data cen- 
ters, including VM assignment, traffic engineering, as well as energy-efficient 
data center networking. 

8.1 VM Assignment and Traffic Engineering 

Traffic engineering in DCNs has been extensively studied. Due to the central- 
ized environment of data centers, centralized controllers are broadly used to 
schedule or route traffic flows. AL-Fares et al. proposed Hedera [?], which is a 
scalable, dynamic flow scheduling system that adaptively schedules a multi-stage 
switching fabric to efficiently utilize aggregate network resources. Benson et al. 
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[?] proposed MicroTE, a system that adapts to traffic variations by leveraging 
the short term and partial predictability of the traffic matrix, to provide fine- 
grained traffic engineering for data centers. Abu-Libdeh et al. [?] realized that 
providing application-aware routing services is advantageous. They proposed a 
symbiotic routing to achieve specific application-level characteristics. 

Recently, data center network virtualization architectures such as Second- 
Net [?] and Oktopus [?] have been proposed. Both of them consider the virtual 
cluster allocation problem, i.e., how to allocate VMs to servers while guaran- 
teeing network bandwidth. In a recent work, Xie et al. [?] proposed TIVC, a 
fine-grained vitual network abstraction that models the time- varying nature of 
networking requirement of cloud applications, to better utilize networking re- 
sources. Meng et al. [?] proposed to use traffic-aware VM placement to improve 
the network scalability. Then, they explored how to achieve better resource 
provisioning using VM multiplexing by exploring the traffic patterns of VMs 
[?] . In a follow-up work [?] , they studied how to consolidate VMs with dynamic 
bandwidth demand by formulating a Stochastic Bin Packing problem and pro- 
posed an online packing algorithm. Jiang et al. [?] explored how to combine 
VM placement and routing for data center traffic engineering, and provided an 
efficient on-line algorithm for it. However, they didn't consider the temporal in- 
formation of the communication patterns of the applications and the topologies 
features. 

8.2 Energy Efficient Data Center Networking 

Many approaches have been proposed to improve the energy efficiency of DCNs. 
These techniques can be usually classified into two categories: The first one is to 
design new topologies which use less network devices while guaranteeing similar 
performance and connectivity, such as the flatted butterfly proposed by Abts et 
al. [?] or PCube [?], a server-centric network topology for data centers, which 
can vary bandwidth availability based on traffic demands; The second one is 
to find optimization methods for current DCNs. The most representative work 
in this category is ElasticTree [?], a network- wide power manager which can 
dynamically adjust the set of active network elements, to satisfy variable data 
center traffic loads. Shang et al. [?] considered saving energy from a routing per- 
spective, routing flows with as less network devices as possible. Mahadcvan et al. 
[?] discussed how to reduce the network operational power in large-scale systems 
and data centers. Recently, Wang ct al. [?] proposed CARPO, a correlation- 
aware power optimization algorithm that dynamically consolidates traffic flows 
onto a small set of links and switches and shut down unused network devices. 
Zhang ct al. [?] proposed a hierarchical model to optimize the power in DCNs 
and proposed some simple heuristics for it. To the best of our knowledge, this 
is the first paper to address the power efficiency of DCNs from a comprehensive 
point of view, leveraging an integration of many useful properties we can take 
advantage of in data centers. 
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9 Conclusion 

In this paper, wc study the problem of achieving energy efficiency in DCNs. 
Unhke traditional traffic engineering based solutions, we provide a new general 
framework where some unique features of data centers have been used. Based on 
this framework, we define an energy saving problem with a time-aware model 
and prove its NP-hardness. We solve the problem in two steps. First, we 
carry out a purposeful VM assignment algorithm that provides favorable traffic 
patterns for energy-efficient routing, based on three VM assignment principles 
we propose. Then, we analyze the relation between the power consumption 
and routing and then propose a two-phase energy-efficient routing algorithm. 
This algorithm aims to minimize the number of switches that will be used and 
to balance traffic flows among them. The experimental results show that the 
proposed framework provides substantial benefit in terms of energy savings. By 
combining VM assignment and routing, up to a 50% of the energy can be saved. 
Moreover, the proposed algorithms can be run in reasonable time and can be 
applied in large-scale data centers. 
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